Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: update torch v2.5.1 #1849

Merged
merged 6 commits into from
Nov 17, 2024
Merged

chore: update torch v2.5.1 #1849

merged 6 commits into from
Nov 17, 2024

Conversation

zhyncs
Copy link
Member

@zhyncs zhyncs commented Oct 31, 2024

Motivation

ref https://github.com/sgl-project/sglang/actions/runs/11605237798/job/32315285934

Modifications

Checklist

  • Format your code according to the Contributor Guide.
  • Add unit tests as outlined in the Contributor Guide.
  • Update documentation as needed, including docstrings or example tutorials.

@zhyncs zhyncs marked this pull request as draft October 31, 2024 02:49
@merrymercy
Copy link
Contributor

cc @jerryzh168 who needs torch 2.5

python/pyproject.toml Outdated Show resolved Hide resolved
@fengyang95
Copy link

@zhyncs In this case, does torch compile now support fp8?

@zhyncs zhyncs marked this pull request as ready for review November 15, 2024 09:38
@zhyncs
Copy link
Member Author

zhyncs commented Nov 15, 2024

@zhyncs
Copy link
Member Author

zhyncs commented Nov 16, 2024

@zhyncs
Copy link
Member Author

zhyncs commented Nov 16, 2024

@zhyncs zhyncs marked this pull request as ready for review November 16, 2024 19:27
@zhyncs
Copy link
Member Author

zhyncs commented Nov 17, 2024

@zhyncs
Copy link
Member Author

zhyncs commented Nov 17, 2024

Currently, the performance of non-streaming small batch size cases is below expectations. ref #1563 cc @Ying1123

@zhyncs
Copy link
Member Author

zhyncs commented Nov 17, 2024

@zhyncs
Copy link
Member Author

zhyncs commented Nov 17, 2024

[
  {
    "timestamp": "2024-11-17T07:25:49.048134",
    "model": "meta-llama/Llama-3.1-8B-Instruct",
    "metrics": {
      "en": 0.836,
      "en:std": 0.3702755730533679,
      "group_latin": 0.836,
      "group_latin:std": 0.3702755730533679,
      "score:std": 0.3702755730533679,
      "score": 0.836
    },
    "score": 0.836
  },
  {
    "timestamp": "2024-11-17T07:26:37.000698",
    "model": "mistralai/Mistral-7B-Instruct-v0.3",
    "metrics": {
      "en": 0.604,
      "en:std": 0.48906441293555597,
      "group_latin": 0.604,
      "group_latin:std": 0.48906441293555597,
      "score:std": 0.48906441293555597,
      "score": 0.604
    },
    "score": 0.604
  },
  {
    "timestamp": "2024-11-17T07:28:02.777706",
    "model": "deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct",
    "metrics": {
      "en": 0.876,
      "en:std": 0.3295815528818322,
      "group_latin": 0.876,
      "group_latin:std": 0.3295815528818322,
      "score:std": 0.3295815528818322,
      "score": 0.876
    },
    "score": 0.876
  },
  {
    "timestamp": "2024-11-17T07:29:14.852606",
    "model": "google/gemma-2-27b-it",
    "metrics": {
      "en": 0.924,
      "en:std": 0.26499811320083017,
      "group_latin": 0.924,
      "group_latin:std": 0.26499811320083017,
      "score:std": 0.26499811320083017,
      "score": 0.924
    },
    "score": 0.924
  },
  {
    "timestamp": "2024-11-17T07:32:26.439859",
    "model": "meta-llama/Llama-3.1-70B-Instruct",
    "metrics": {
      "en": 0.976,
      "en:std": 0.15304901175767194,
      "group_latin": 0.976,
      "group_latin:std": 0.15304901175767194,
      "score:std": 0.15304901175767194,
      "score": 0.976
    },
    "score": 0.976
  },
  {
    "timestamp": "2024-11-17T07:34:16.554962",
    "model": "mistralai/Mixtral-8x7B-Instruct-v0.1",
    "metrics": {
      "en": 0.648,
      "en:std": 0.4775939698111775,
      "group_latin": 0.648,
      "group_latin:std": 0.4775939698111775,
      "score:std": 0.4775939698111775,
      "score": 0.648
    },
    "score": 0.648
  },
  {
    "timestamp": "2024-11-17T07:35:54.798802",
    "model": "Qwen/Qwen2-57B-A14B-Instruct",
    "metrics": {
      "en": 0.884,
      "en:std": 0.32022492095400695,
      "group_latin": 0.884,
      "group_latin:std": 0.32022492095400695,
      "score:std": 0.32022492095400695,
      "score": 0.884
    },
    "score": 0.884
  },
  {
    "timestamp": "2024-11-17T07:37:30.045613",
    "model": "deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct",
    "metrics": {
      "en": 0.86,
      "en:std": 0.3469870314579494,
      "group_latin": 0.86,
      "group_latin:std": 0.3469870314579494,
      "score:std": 0.3469870314579494,
      "score": 0.86
    },
    "score": 0.86
  },
  {
    "timestamp": "2024-11-17T07:38:26.084096",
    "model": "neuralmagic/Meta-Llama-3.1-8B-Instruct-FP8",
    "metrics": {
      "en": 0.88,
      "en:std": 0.32496153618543844,
      "group_latin": 0.88,
      "group_latin:std": 0.32496153618543844,
      "score:std": 0.32496153618543844,
      "score": 0.88
    },
    "score": 0.88
  },
  {
    "timestamp": "2024-11-17T07:39:14.548670",
    "model": "neuralmagic/Mistral-7B-Instruct-v0.3-FP8",
    "metrics": {
      "en": 0.552,
      "en:std": 0.4972886485734417,
      "group_latin": 0.552,
      "group_latin:std": 0.4972886485734417,
      "score:std": 0.4972886485734417,
      "score": 0.552
    },
    "score": 0.552
  },
  {
    "timestamp": "2024-11-17T07:40:51.686684",
    "model": "neuralmagic/DeepSeek-Coder-V2-Lite-Instruct-FP8",
    "metrics": {
      "en": 0.888,
      "en:std": 0.31536645351083237,
      "group_latin": 0.888,
      "group_latin:std": 0.31536645351083237,
      "score:std": 0.31536645351083237,
      "score": 0.888
    },
    "score": 0.888
  },
  {
    "timestamp": "2024-11-17T07:41:42.733429",
    "model": "neuralmagic/gemma-2-2b-it-FP8",
    "metrics": {
      "en": 0.612,
      "en:std": 0.4872945721019269,
      "group_latin": 0.612,
      "group_latin:std": 0.4872945721019269,
      "score:std": 0.4872945721019269,
      "score": 0.612
    },
    "score": 0.612
  },
  {
    "timestamp": "2024-11-17T07:43:15.627071",
    "model": "neuralmagic/Meta-Llama-3.1-70B-Instruct-FP8",
    "metrics": {
      "en": 0.964,
      "en:std": 0.18629009635512025,
      "group_latin": 0.964,
      "group_latin:std": 0.18629009635512025,
      "score:std": 0.18629009635512025,
      "score": 0.964
    },
    "score": 0.964
  },
  {
    "timestamp": "2024-11-17T07:44:52.078680",
    "model": "neuralmagic/Mixtral-8x7B-Instruct-v0.1-FP8",
    "metrics": {
      "en": 0.624,
      "en:std": 0.48438001610305936,
      "group_latin": 0.624,
      "group_latin:std": 0.48438001610305936,
      "score:std": 0.48438001610305936,
      "score": 0.624
    },
    "score": 0.624
  },
  {
    "timestamp": "2024-11-17T07:46:25.651174",
    "model": "neuralmagic/Qwen2-72B-Instruct-FP8",
    "metrics": {
      "en": 0.948,
      "en:std": 0.2220270253820467,
      "group_latin": 0.948,
      "group_latin:std": 0.2220270253820467,
      "score:std": 0.2220270253820467,
      "score": 0.948
    },
    "score": 0.948
  },
  {
    "timestamp": "2024-11-17T07:48:11.049916",
    "model": "neuralmagic/Qwen2-57B-A14B-Instruct-FP8",
    "metrics": {
      "en": 0.824,
      "en:std": 0.380820167533181,
      "group_latin": 0.824,
      "group_latin:std": 0.380820167533181,
      "score:std": 0.380820167533181,
      "score": 0.824
    },
    "score": 0.824
  },
  {
    "timestamp": "2024-11-17T07:50:01.784188",
    "model": "neuralmagic/DeepSeek-Coder-V2-Lite-Instruct-FP8",
    "metrics": {
      "en": 0.876,
      "en:std": 0.3295815528818322,
      "group_latin": 0.876,
      "group_latin:std": 0.3295815528818322,
      "score:std": 0.3295815528818322,
      "score": 0.876
    },
    "score": 0.876
  },
  {
    "timestamp": "2024-11-17T07:51:11.661882",
    "model": "hugging-quants/Meta-Llama-3.1-8B-Instruct-AWQ-INT4",
    "metrics": {
      "en": 0.848,
      "en:std": 0.35902089075707005,
      "group_latin": 0.848,
      "group_latin:std": 0.35902089075707005,
      "score:std": 0.35902089075707005,
      "score": 0.848
    },
    "score": 0.848
  },
  {
    "timestamp": "2024-11-17T07:52:26.298293",
    "model": "hugging-quants/Meta-Llama-3.1-8B-Instruct-GPTQ-INT4",
    "metrics": {
      "en": 0.852,
      "en:std": 0.35509998591945907,
      "group_latin": 0.852,
      "group_latin:std": 0.35509998591945907,
      "score:std": 0.35509998591945907,
      "score": 0.852
    },
    "score": 0.852
  }
]

The results of the nightly eval gsm8k on NVIDIA Cloud H100 are OK, but the CIs are not very stable, so ignore them for now.

@zhyncs zhyncs merged commit 3b87886 into main Nov 17, 2024
9 of 12 checks passed
@zhyncs zhyncs deleted the upd branch November 17, 2024 16:06
@zhyncs
Copy link
Member Author

zhyncs commented Nov 17, 2024

BTW I only verified on NVIDIA Cloud H100, may you help verify on AMD and Intel cc @HaiShaw @liangan1

@zhyncs
Copy link
Member Author

zhyncs commented Nov 17, 2024

After this PR is merged, all other CIs passed except https://github.com/sgl-project/sglang/actions/runs/11880304446/job/33103293929

Traceback (most recent call last):
  File "/actions-runner/_work/sglang/sglang/test/srt/test_triton_attention_backend.py", line 29, in test_latency
Writing report to /tmp/mmlu_meta-llama_Llama-3.1-8B-Instruct.html
    assert output_throughput > 153, f"{output_throughput=}"
AssertionError: output_throughput=151.8

The results are very close and can be temporarily ignored, use another issue #2059 to track.

merrymercy added a commit that referenced this pull request Nov 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants